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Abstract 

Discrete-time Rayleigh fading single-input single-output (SISO) and multiple-input multiple- 
output (MIMO) channels are considered, with no channel state information at the transmitter or the 
receiver. The fading is assumed to be stationary and correlated in time, but independent from antenna 
to antenna. Peak-power and average-power constraints are imposed on the transmit antennas. For 
MIMO channels, these constraints are either imposed on the sum over antennas, or on each individual 
antenna. For SISO channels and MIMO channels with sum power constraints, the asymptotic capacity 
as the peak signal-to-noise ratio goes to zero is identified; for MIMO channels with individual power 
constraints, this asymptotic capacity is obtained for a class of channels called transmit separable 
channels. The results for MIMO channels with individual power constraints are carried over to SISO 
channels with delay spread (i.e. frequency selective fading). 
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I. Introduction 

In this paper we present results on the capacity of discrete-time Rayleigh fading single- 
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00 ■ input single-output (SISO) and multiple-input multiple-output (MIMO) channels. We assume 

a noncoherent model where no channel state information is available at the transmitter or the 
receiver. The fadings are assumed to be stationary processes correlated in time but, for MIMO 
channels, independent for distinct input/output antenna pairs. A hard peak-power constraint, in 
addition to an average-power constraint, is imposed on the input signals. For MIMO channels 
we consider two types of constraints: under one the peak and average power constraints are 
imposed on each of the signals transmitted by the different antennas separately, and under 
the other the constraints are on the sum of the powers in the different signals. We focus on 
channel capacity at low signal-to-noise ratio (SNR), but we also derive upper bounds that are 
valid for any SNR. 

We also consider SISO channels with delay spread (i.e., frequency selective) fading where 
the fading is modeled by a finite number of taps. The fading processes corresponding to the 
different taps are assumed to be independent across taps, and allowed, within each tap, to be 
correlated in time. 

The capacity of fading channels at low SNR was studied in [4-18]. The main motivation 
for our present work has been to understand the capacity of communication over wideband 
channels. Work of Kennedy [8], Jacobs [7], Telatar and Tse [15], and Durisi et al. [5] 
demonstrate that the capacity of such channels, in the wideband limit, is the same as for 
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a wideband additive Gaussian noise channel with no fading, but the input signals, such as 
M-ary FSK, are highly bursty in the frequency domain or time domain. The work of Medard 
and Gallager [10] (also see [14]) shows that if the burstiness of the input signals is limited 
in both time and frequency, then the capacity of such wideband channels becomes severely 
limited. In particular, the required energy per bit converges to infinity. 

Wireless wideband channels typically include both time and frequency selective fading. One 
approach to modeling such channels is to partition the frequency band into narrow subbands, 
so that the fading is flat, but time-varying, within each subband. If the width of the subbands is 
approximately the coherent bandwidth of the channel, then they will experience approximately 
independent fading. The flat fading models used in this paper can be considered to be models 
for communication over a subband of a wideband wireless fading channel. The peak-power 
constraints that we impose on the signals can then be viewed as burstiness constraints in 
both the time and frequency domain for wideband communication, similar to those of [10, 
14]. However, in this paper, we consider hard peak constraints, rather than fourth moment 
constraints as in [10, 14], and we consider the use of multiple antennas. 

The recent work of Srinivasan and Varanasi [13] is closely related to this paper. It gives low 
SNR asymptotics of the capacity of MIMO channels with no side information for block fading 
channels, with peak and average-power constraints, with the peak constraints being imposed on 
individual antennas. One difference between [13] and this paper is that we assume continuous 
fading rather than block fading. In addition, we provide upper bounds on capacity rather 
than only asymptotic bounds as in [12, 13]. We assume, however, that the fading processes 
are Rayleigh distributed, whereas the asymptotic bounds do not require such distributional 
assumption. The work of Rao and Hassibi [12] is also related to this paper. It gives low 
SNR asymptotics of the capacity of MIMO channels with no side information for block 
fading channels, but the peak constraints are imposed on coefficients in a particular signal 
representation, rather than as hard constraints on the transmitted signals. 

The model in this paper considers both a peak constraint and an average power constraint. 
Upper bounds are given on the capacity which are valid for any ratio of these constraints, 
but the low SNR asymptotics focuses only on the case where the ratio is constant. The ratio 
is also held constant in the asymptotic analysis of Srinivasan and Varinasi [13]. The paper 
of Wu and Srikant [18] focuses on the asymptotic capacity and error exponent for a fixed 
peak constraint, as the average power goes to zero. The paper of Zheng et. al [19] considers a 
general scaling of the peak constraint to average power constraint, with the scaling depending 
also on the coherence time. For a fixed ratio of peak constraint to average power constraint, the 
capacity scales quadratically as SNR converges to zero, whereas for a fixed peak constraint, 
the capacity scales linearly with capacity as SNR converges to zero. Cases between these two 
extremes are investigated in [19]. For wideband cellular systems using OFDM modulation, 
the peak constraint is usually expressed in the time domain, because of the limitations on 
the linear range of transmit power amplifiers. In such case, the peak power constraint in 
a particular frequency is not severe, so letting the peak constraint be constant or letting it 
converge to zero more slowly than the average power may be most appropriate. In cases in 
which interference with other users within the same band is especially important, for example 
for use of unlicensed or secondary spectrum, a peak constraint of the same order of magnitude 
as the average power constraint, as considered in this paper, may be the most relevant. The 
papers [13, 18, 19] consider block fading channels, whereas a stationary, correlated fading 
channel model is adopted here. 

The capacity of noncoherent stationary flat fading channels at high SNR was studied in 
[20-23], and the capacity of delay spread channels at high SNR was recently studied in [24]. 
For regular fading processes [20] demonstrated a connection between the high-SNR capacity 
growth and the error in predicting the fading process from noiseless observations of its past, 
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whereas for nonregular fading [22] demonstrated such a connection to the error in predicting 
the fading process from noisy observations of its past in the low observation noise regime. In 
this paper we point to an analogous connection between the low SNR asymptotic capacity and 
the error in predicting the fading process from very noisy observations of its past. We show 
that these prediction errors in the high observation noise regime determine the asymptotic low 
SNR capacity of SISO channels and MIMO channels with sum power constraints. They also 
determine the capacity of a class of MIMO channels satisfying a certain separability condition. 
Our results on delay spread channels follow from those on MIMO channels with individual 
power constraints. 

The rest of this paper is arranged as follows. In Section II we describe the channel models 
that are considered in this paper and present the main capacity results obtained with these 
channel models. In Sections III to VI the capacity results are proved-in some cases by 
exhibiting additional capacity bounds. 



II. Channel Models and Main Results 

We study four types of channels: SISO channels, MIMO channels with sum (across transmit 
antennas) power constraints, MIMO channels with individual (per transmit antenna) power 
constraints, and SISO channels with delay spread. In this section we shall describe these 
models and present our results on their capacities. 



A. SISO channels 

We begin with the SISO channel, which models noncoherent discrete-time single-antenna 
communication over time-selective flat fading channels. 

I) The Model: The time-/c complex- valued output Y k e C of the SISO channel is given by 

Y k = ^~ P H k z k + W k , (1) 

where z k e C is the time- A; channel input; the SNR p is a positive scaling constant; the complex 
stochastic process {H k } is the multiplicative fading process and the complex stochastic process 
{W k } models additive noise. 

We assume that the processes {H k } and {W k } are independent and that their joint law does 
not depend on the input sequence {z k }. The additive noise sequence {W k } is a sequence of 
IID proper complex normal (PCN) random variables of mean zero and variance one. Such a 
distribution is denoted by Ac(0, 1). The fading process {H k } is assumed to be a zero-mean, 
unit-variance, stationary, PCN process. We denote its autocorrelation function by R(-) and 
assume that it has a spectral density function S(-). Thus 

R(m)±E[H k+m H* k ] 

S(uj)e ir ™—, k,meZ 

and, in particular, 

R(0) = E [\H k \ 2 ] = J' S{u)^- = 1, keZ. 

Note that the existence of its spectral density function implies that {H k } is ergodic. We shall 
assume throughout that the autocorrelation is square- summable, i.e., that 

oo 

\R(")\ 2 <°o (2) 
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and define 

oo 

A= £ TOI 2 (3) 



The input is simultaneously subjected to two power constraints: a peak power constraint 
and an average power constraint. The peak power constraint is that the time-A; channel input 
Z k must satisfy, with probability one, 

\Z k \ < 1, keZ. (4) 

The average-power constraint is that 

E[\Z k \ 2 }<^, keZ, (5) 

where the peak-to-average ratio (3 is some constant satisfying (3 > 1 and is the ratio of the 
maximum allowed peak power to the maximum allowed average power. Since the average of a 
random variable cannot exceed its maximal value, it follows that (4) implies E[|Z fe | 2 ] < 1, so 
that setting (3 = 1 renders the average power constraint inactive and thus reduces the problem 
to one of communication subject to a peak-power constraint only. 
The capacity of this channel is given by 

C(p,0)= lim -sup/(Zr;>7), 

n^oo fl 

where the supremum is taken over all joint distributions on Z™ satisfying the peak power 
constraint (4) and the average power constraint (5). The square- summability assumption (2), 
together with the assumption that {H k } is PCN, implies that the random process {H k } is 
weakly mixing (in fact, mixing). Therefore, a coding theorem exists for C(p,(3), based on 
notions surrounding information stability (see [25, 26]) and the Shannon-McMillan-Breiman 
theorem for finite- alphabet ergodic sources. See [16] for details. Roughly speaking, the op- 
erational meaning of C(p,/3) is that for any rate R (expressed in bits per channel use) less 
than the capacity, there exists a sequence of codes with blocklength converging to infinity, 
such that each code meets peak and average power constraints, each code has 2 nR codewords, 
where n is the length of the code, and the probability of decoding any codeword incorrectly 
converges to zero. The average power constraint can be imposed on the expectation (over a 
uniformly chosen codeword) or on the maximum (over all the codewords) of the normalized 
(by the blocklength) energy of the codeword. 
We define c(f3) as the limiting ratio 

c((3)=ln S -^- (6) 

when the limit exists. We next present our results on C(p,(3). 

2) Results on SISO Fading: Our first result gives the asymptotic capacity of the SISO 
channel. 

Proposition 2.1 (Asymptotic Capacity): For any (3 > 1, the limit in (6) exists and is given 

by 

c(f3) = - ■ max {aA — a 2 } (7) 



0<a<g 



2p 
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By evaluating the RHS of (7) for the special case where (3 — 1, i.e., when the average 
power constraint is inactive, we obtain 

Corollary 2.1 (Asymptotic Capacity — No Average Power Constraint): Under the peak-power 
constraint (4) only, 

<=(!) = {!, ifA<2 . P> 
v ; \±=± if A > 2 

Motivated by the different asymptotic behaviors of channel capacity that occur depending 
on whether A < 2 or A > 2, we introduce the following definition. 

Definition 2.1: A zero-mean discrete-time PCN stationary process {Hk} (not necessarily of 
unit variance) is ephemeral if its autocorrelation function R(-) satisfies 

oo 

\R{u)\ 2 <2R\Q). (10) 

v=— oo 

Otherwise, {H k } is nonephemeral. 

Note that if the fading process {H k } is of unit-variance, then R(0) = 1 and {H k } is ephemeral 
if A < 2, where A is defined in (3). When the fading process in the SISO fading channel (1) 
is ephemeral we consider the channel itself to also be ephemeral. Otherwise, we consider the 
channel to be nonephemeral. 

In addition to asymptotic expansions, we provide a firm upper bound on C(p,(3): 
Proposition 2.2 (A Firm Upper Bound on Capacity): For any p > and (3 > 1, 

C(p,P)<U(p,P), (11) 

where 

U(p, (3) 4 l 0g (l + p((p, (3)) - C(p, P)I(P), (12) 

C(p,/3)4 mi n(i -L-H, (13) 



and 



I{p) = f_\og(l + pS{u))^. (14) 

It is interesting to note that, in general, IID input distributions do not achieve the same 
asymptotic behavior as channel capacity. This is best seen in the next proposition on the 
asymptotic behavior of the mutual information corresponding to IID inputs. We first define 

CnD (/3) = lim \ ( lim sup -I(Z?; Y{ 1 )) (15) 



p|0 p 2 V n->oo * 77, 

if the limit exists, where the supremum is over all IID distributions on satisfying (4) and 
(5). 

Proposition 2.3 (Asymptotic Rates for IID Inputs): If the autocorrelation function R(-) is 
absolutely summable, i.e., 

oo 

£ \R(v)\<oo, (16) 

v=— oo 

then the limit in (15) exists and is given by 

cddC9) = < ^ ^ A < 2 ~ I • (IV) 
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Using this proposition we see that, subject to (16) (which is more stringent than (2)), IID 
inputs achieve the asymptotic behavior of channel capacity only if A = 1 (in which case the 
channel is memoryless) or when the two conditions (3 = 1 and A > 2 are both met. Figure 1 
depicts c(1.5) and cn D (1.5) as functions of A. 




1 2 3 4 5 

A 

Fig. 1. Comparison of c(1.5) (asymptotic capacity) and chd(1.5) (asymptotic information rates achievable with IID input 
symbols). Peak-to-average ratio /3 = 1.5. 

3) Discussion: A few remarks about the results are called for. 

• Capacity and Prediction: The error in predicting the time-zero fading H based on the 
previous values of the fading {H v }zlo was shown in [20] to be related to the high SNR 
asymptotic behavior of channel capacity. If this prediction error is strictly positive, then 
capacity at high SNR grows double logarithmically in the SNR. In cases where this 
prediction error is zero, a finer analysis of the prediction problem is needed to establish 
the high SNR capacity asymptotics [22]. Indeed, when the prediction error is zero, the 
capacity asymptotics are determined by the behavior of the noisy prediction error. This 
noisy prediction error cr 2 (p) is defined as the mean squared-error in predicting H based 
on (. . . , H_ 2 + N_ 2 , #_i + iV_i) where {N k } are IID Af c (0, l/p) and independent of 
{H k }. Furthermore, cx 2 (0) is assigned the value R(0) corresponding to the limiting case 
of estimating H in the absence of past information. It is given by classical formulas for 
optimal prediction of stationary random processes by (see [22] for details): 

a 2 (p) = exp|^l 0g Q + 5M^|-i p>0. (18) 

Here we note that the noisy prediction error also determines the asymptotic behavior of 
channel capacity at low SNR. Indeed, by Proposition 2.1, the low SNR asymptotics are 
determined by A, which is defined in (3), and is related to the behavior of the noisy 
prediction error in the following way: the Taylors series expansion of cr 2 (p) is 

a\p)=R(0)- X ~f i0) p + o(p), (19) 

where the notation o(-) is used in the sense that lim^o o(x)/x = 0. We further note that 
I(p), defined in (14), can be expressed as 

I(p) = log (l+pa 2 (p)), (20) 
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and (19) is equivalent to 

I(p) = R(Q).p-^- + o{p 2 ). (21) 
For fading processes that have unit variance, (19) and (20) still hold, while (21) becomes 

I(p)=p-^f + o(p 2 ). (22) 

The proof of (19), (21) and (22) is a straightforward application of the second order 
Taylor expansion of the function x i— > log(l + x), 

x 2 

log(l + x) =x - — + o(x 2 ), 

and the monotone convergence theorem. 

• Input Distributions: The proof of the achievability part of Proposition 2. 1 demonstrates 
that the low SNR asymptotics of channel capacity can be achieved by considering joint 
distributions on Z 1: . . . , Z n of the form 

Z k = U ■ $ fc , 1 < k < n, (23) 

where U is a random variable taking value in {0, 1} and where the sequence {$fc} 
is independent of U and consists of zero-mean modulus- 1 random variables that are 
uncorrelated. The amplitude modulation component of the optimal signaling strategy is 
captured in the law of U, and the phase modulation component is captured in the law of 
$. Some examples of distributions on {$ k } are the following: 

i) $ fc = exp(i • kB), where i = and where is a discrete random variable 
uniformly distributed over the set {^- ■ j £ {0, . . . , m — 1}} for some integer m > 
1. This is m-ary frequency shift keying (FSK); 

ii) {$fe} are IID random variables uniform over the set {exp(i • : j e {0, . . . , m — 1}} 
for some integer m > 1. This is m-ary phase shift keying (PSK); 

iii) {$fe} are IID random variables uniformly distributed over the set {exp(i9) : 9 e 
[0, 2tt)}. This is also a form of PSK. 

In practice, the signal of duration n described in (23) would be considered as a single 
symbol, and, as is usual in the theory of channel coding, longer random codewords 
would be comprised of many independent length n symbols. Note that even when {$*;} 
are IID (as when PSK is used), the random variables {Z k } need not be IID because {Z k } 
all have the same magnitude (namely, U). Thus, whenever U is not deterministic, the 
sequence {Z k } is not IID. The fact that our proposed input distribution (23) does not 
render {Z k } IID should not be surprising because IID inputs do not typically achieve the 
low SNR channel capacity asymptotics; see Proposition 2.3. When the fading channel 
is not memoryless, then IID inputs achieve the asymptotic capacity only if there is no 
average power constraint and if there is sufficient memory in the channel (A > 2.) Also, 
when there is sufficient memory in the channel, amplitude modulation (nondeterministic 
U) is needed whenever the average and peak power constraints differ. The ON-OFF ratio 
of U is then determined by the ratio of the average to peak power constraints. 

• Relation to STORM: The FSK version (i.e. case (i) above) of the input distribution (23) 
is the single antenna special case of the space time orthogonal rank one modulation 
(STORM) input distribution, proposed for MIMO block fading channels by Srinivasan 
and Varanasi [13]. The distribution is used differently here. In [13] the parameter n is 
taken to equal the block length of the channel. For the stationary fading considered here, 
the capacity asymptote is approached by letting n — > oo. 
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• PSK Inputs: Zhang & Laneman [9] studied the low-SNR asymptotic behavior of the 
information rates that can be achieved on our channel when PSK inputs are used. In the 
language of (23), PSK inputs correspond to choosing U in (23) to be deterministically 
equal to one and {$ fc } to be a sequence of PSK input symbols (as described in ii) or iii) 
above). (The constellation of the PSK does not affect the asymptotic information rate.) 
The asymptotic rates achieved by PSK (with (3 — 1) were derived in [9] and are given by 

a,. Cpsk(p) A — 1 

cpsk = hm — 5 — = ~^T~ ' = x )> ( 24) 

where C PS k(p) denotes the information rate achieved using PSK inputs. PSK is, in general, 
suboptimal when (3 > 1 because in PSK the peak power and the average power are the 
same. Even when (3 = 1, PSK is not always optimal. It is optimal for nonephemeral 
channels because for (3 = 1 and A > 2 the RHS of (24) agrees with the RHS of (9): 

cpsk = c(1), (A>2, /3 = 1). (25) 

For 1.5 < A < 2, PSK is only optimal among IID input distributions 

cpsk = c IID (l) < c(l), (1.5 < A < 2, (3=1). (26) 

And for 1 < A < 1.5, PSK is not optimal even among IID input distributions 

cpsk < cmj(l) < c(l), (1 < A < 1.5, (3 = 1). (27) 

Consider the special case when the channel is memoryless (A = 1): here, PSK does not 
achieve any positive rates because it encodes all information in the phase of the transmit 
signal; the memoryless channel completely wipes this information out by adding a phase 
term that takes new independent realizations with time. Unlike PSK, IID inputs that use 
amplitude modulation can achieve positive rates on the memoryless fading channel. Figure 
2 compares c(l), cn D (l), and c P sk as functions of A. 




1.5 



A 



2.5 



Fig. 2. Comparison of c(l), chd(1) and cpsk (no average power constraint). 



On the Nonasymptotic Bound: The nonasymptotic bound presented in Proposition 2.2 is 
tight at low SNR in the sense that for any fixed (3 > 1 we have 



Pl0 p 2 



(28) 
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It is also tight when p > is held fixed and (3 goes to infinity in the sense that 

lim ^7(p,/3) = lim ?-C(p,P), p > 0, 

p^oo p /3^oo p 

where the RHS of the above is given by [16] 

P. 



lim -C(p,P) 

/3-»oo p 



I{P) 



p > 0. 



(29) 



(30) 



Thus, our upper bound could be used as an alternate to the upper bound used in [16]. 
Note that in fixing p and letting (3 go to infinity we are holding the peak power fixed and 
letting the allowed average power go to zero. 

To verify (29) one can compute the LHS of (29) and then show that it equals the RHS 
of (30). This can be done by noting that for (3 sufficiently large we have ((p, (3) = 1/(3, 
and thus 



U(p, (3) =lo. 



The upper bound U(p, (3) is found to be close to the channel capacity at nontrivial values 
of SNR. As a demonstration, we numerically compare the upper bound to the following 
lower bound derived in [17]. Let 

I^YolZZ^Y- 1 



L(p)=I(Z ;Y \ZZ^Y^) (31) 
where the input {Z k } is an IID Quadrature PSK process. The channel capacity satisfies 

(32) 



C(p,(3) > max -L(^). 



In Figure 3 , we graph the capacity bounds U and L for a Gauss-Markov channel with 
correlation coefficient 0.99. Numerical integration is used to compute the lower bound. 
The peak-to-average ratio is set as 10. The bounds are found to be fairly tight. 
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Fig. 3. Comparison of upper and lower bounds on capacity. Gauss-Markov channel with correlation coefficient 0.99. 
Peak-to-average ratio (3 = 10. 
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B. MIMO channels with sum power constraints 

For MIMO channels, we separately consider two different types of constraints on the input 
signal. The constraints are imposed either on sums across transmit antennas, or on individual 
transmit antennas. This subsection is devoted to MIMO channels with sum power constraints. 

1) The Model: We consider a single user discrete-time MIMO channel with n T transmit 
antennas and n R receive antennas. The time-A; channel output Y k G C" R is given by 

Y fe = yfpR k z k + W fc , (33) 

where y/pz k is the time-A; channel input, with p > representing the peak SNR and z k G C TtT . 
In the above, the multiplicative noise {H fe }^_ 00 is a matrix- valued stochastic process such 
that at every time instant k G Z, the random matrix M k is an n R x n T complex random matrix. 
The random vectors {W fc }^ = _ 00 are IID random vectors, each consisting of n R independent 
A/"c(0, 1) components. Thus, W ~ Ac(0, l nR ), where l nR is the n R x n R identity matrix. 
Denoting by the row-r column-t entry in M k , we can write the r-th element in Y k as 

t=i 

As for SISO channels, we assume that {M k } and {W^} are independent, and that their joint 
law does not depend on {Z k }. We further assume that for each pair (r, t) satisfying 

(r,t) G {l,...,n R } x {l,...,nr} (35) 

the scalar process {H^}^^ is a zero-mean, stationary, PCN process with autocorrelation 
function R r ,t(-) and spectral density function S T:t (-). We also assume that the n R x n T processes 
corresponding to the different pairs (r, t) satisfying (35) are independent. We finally assume 
throughout this paper that autocorrelation R r ,t{ ) is square-summable for every antenna pair 
(r, t) satisfying (35) and define 

oo 

\ r , t ± \ R rA")\ 2 (36) 

v=— oo 

=/_>^)l^. 

Definition 2.2: A MIMO channel is said to be nonephemeral if for every pair (r, t) satisfying 
(35), the fading process {H^} is nonephemeral, i.e., if 

A r , t > 2i^ jt (0), for all (r, t) G {1, . . . , n R } x {1, . . . , n T }. (37) 

Definition 2.3: A MIMO channel is said to be transmit separable if there are n T nonnegative 
constants {a t : t e {1, . . . ,n T }} and n R autocorrelation functions {-R r (-) : r e {1, . . . ,n R }} 
with corresponding spectral density functions S r (-) such that 

Rr,t(k) = a t R r (k) 

for all (r,t) G {1, . . . , n R } x {1, . . . , n T } and k G Z. 

Definition 2.3 says that a MIMO channel is transmit separable if fixing any one receive 
antenna, the channels from all the transmit antennas have the same law up to some scaling 
constants. 
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2) The sum power constraints: The sum peak-power constraint on the channel inputs is 
that the time-/c channel input Z k must satisfy, with probability one, 

||Z fc || 2 < 1, keZ, (38) 

1 1 2 

where ||Z fc || 2 denotes the sum of the squares of the components of Z k . The sum average-power 
constraint is that ^ 

E[||Z fc ||*] <-, keZ. (39) 

The capacity of the channel under the sum power constraints (38) and (39) is denoted by 
C s (piP) and is given by 

C s (p, /3) = lim - sup J(Z?; Y") (40) 

n^oo fl 

with the supremum taken over all distributions on Z™ satisfying (38) and (39). We further 
define 

c s (^lim%A (41) 

when the limit exists. 

3) Results on MIMO with sum power constraints: The asymptotic low SNR capacity of 
the MIMO channel under sum power constraints is given in the following proposition. 

Proposition 2.4 (Asymptotic Capacity): For any (3 > 1, the limit in (41) exists and is given 

by 



C S 



j n R ( n T 

(/?) = o su p S i Yl atXr i 

- 5> tJ M0)l }>. (42) 



where 

-4(/3) = I , • • • , a^) e M" T : a* > V t, a t < ± 1 . (43) 

For transmit separable channels, the above proposition is simplified to the following corol- 
lary. 

Corollary 2.2 (Asymptotic Capacity — Transmit Separable): If the MIMO channel is trans- 
mit separable, then 



~2 ™R 

or 



where 

and for every r, 



CsW) = -f* max V {a\ r - a 2 R 2 r (0)} , (44) 
"max - max{«i, . . . , a nj }, (45) 

oo 

A r ^ \RrW (46) 
!/=— oo 
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Corollary 2.3 (Asymptotic Capacity — Transmit Separable, Nonephemeral, No Average Power Constraint): 
If the channel is transmit separable and nonephemeral, and if no average-power constraint is 
imposed (/3 = 1), then 

"* A r -flj(0) 



c S (l) = aL x £^^, (47) 



r=l 

where A r and a max are as defined in Corollary 2.2. 

As for SISO channels, we also give a firm upper bound on C s (p, (3). 

Proposition 2.5 (A Firm Upper Bound on Capacity of MIMO with Sum Constraints): For any 
p > and (3 > 1, 

Cs(p,/3)<U s (p,/3), (48) 

where 

m ( / w T 
U s (p, (3) 4 max £ | log ( 1 + p £ 0^(0) 



-J> t / r »L (49) 



t=i 



I r , t {p) 4 jT log (1 + pS r ») ^, (50) 

and vA(/3) is defined as in (43). 
4) Discussion: 

• Input Distributions: As the proof of Proposition 2.4 suggests, a distribution on Z™ that 
achieves the capacity asymptotically is the following. At most one of the n T transmit 
antennas is used during the whole transmission, with antenna t being chosen with prob- 
ability a t . For the chosen antenna, all the input symbols zf \ . . . , Z$ have magnitude 
one and their phases are chosen in such a way that each symbol is of mean zero and 
different symbols are uncorrelated. If no antenna is chosen, then all antennas keep silent 
during the whole transmission period. 

In the case when the MIMO channel is transmit separable (Corollary 2.2), the distribution 
on Z" suggested in the proof is to only use the one strongest antenna (i.e., the t-th antenna 
with a t = cc max ). The signals sent by this antenna have the same law as those used for 
SISO channels. 

As for SISO channels, the suggested distributions for the above two cases (general and 
transmit separable) on Z" are not IID. 

Finally, for transmit separable, nonephemeral channels with no average-power constraint 
(Corollary 2.3), the suggested input law is to use only the strongest antenna to send 
symbols that all have mean zero, magnitude one and that are uncorrelated in time. 

• Comparison with SISO Channels: We compare the asymptotic capacity of MIMO channels 
with sum power constraints to SISO channels. Consider the simple case when the MIMO 
channel satisfies 

Rr,t(k) = R{k), keZ, (51) 

for every antenna pair (r, t), where R(-) is the autocorrelation function of the SISO channel 
we compare the MIMO with. Note that such a MIMO channel is transmit separable. The 
asymptotic capacity of this MIMO channel follows from Corollary 2.2, and is given by 

c s(P) — "7T max {a\ — a 2 \ = ur • c(f3). 

2 0<a<^ 
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Thus we see that the channel capacity at low SNR grows linearly with the number of 
receive antennas (n R ), but does not grow with the number of transmit antennas (n T ). The 
former observation is easy to understand, because the received signal energy is linear 
in n R ; the latter is not surprising when we recall that an optimal input distribution for 
MIMO channels with sum power constraints is to only use one transmit antenna at any 
time. Intuitively, having multiple transmit antennas is not helpful at low SNR because 
any benefit due to diversity brought by multiple transmit antennas is nulled by the cost 
of tracking the additional fading processes. 



C. MIMO channels with individual power constraints 

The MIMO channel model we consider under individual power constraints is exactly the 
same as the model we consider under sum power constraints, as explained in II-B1. The 
difference is in the form of the power constraints. 

1 ) The individual power constraints: The individual peak-power constraint on the MIMO 
channel is that the time-A; channel input of the t-th antenna must satisfy, with probability 
one, 

\Z®\ < 1, te {!,..., n T }, keZ. (52) 



The individual average-power constraint is that 

E 



l^l 2 



< -p, t e {l,...,nr}, keZ. (53) 

The capacity of the channel described in (33) (or (34)) under the individual power constraints 
(52) and (53) is denoted as C\(p,[3) and is given by 

<7i(p,/3)= lim isup/(Z?;Y?) (54) 

n— »oo n 

with the supremum taken over all distributions on Z™ satisfying (52) and (53). We define 

ci{P) = hm - — (55) 

pio p 2 

when the limit exists. 

2) Results on MIMO with individual power constraints: We have failed to derive the 
asymptotic capacity of a general MIMO fading channel with individual power constraints. 
Upper and lower bounds on the asymptotic capacity are presented in Section V. Here we 
present the asymptotic capacity for transmit separable channels. 

Proposition 2.6 (Asymptotic Capacity — Transmit Separable): If the MIMO channel is trans- 
mit separable, then the limit in (55) exists and is given by 

2 

"R f 



r=l 

-a 2 it> r 2 (0)}. (56) 
The next corollary is a simpler case of the above proposition. 

Corollary 2.4 (Asymptotic Capacity — Transmit Separable, Nonephemeral, No Average Power Constraint): 
If the channel is transmit separable and nonephemeral, and if no average-power constraint is 
imposed (/3 = 1), then 

" T N 2 " R \ r - fl?(o) 




r=l 
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3) Discussion: 

• Input Distributions: As the proof of Proposition 2.6 shows, an input law that achieves the 
capacity asymptotically on a transmit separable MIMO channel is to send the same signal 
on all antennas, with the signal (on each antenna) having the distribution that achieves 
the low SNR capacity of SISO channels. If the signal common to the antennas is an FSK 
signal, this is the STORM input distribution [13]. As mention for the SISO channel, the 
blocklength of the input n is taken to be the blocklength of the channel model in [13], 
whereas here we let n — > oo to achieve the maximum capacity asymptote. 

. Comparison with SISO Channels: We compare the asymptotic capacity of MIMO channels 
with individual power constraints to SISO channels. Consider the case when the MIMO 
channel satisfies (51) for every antenna pair where R(-) is the autocorrelation function of 
the SISO channel. Substituting in (60), we get the following expression for the asymptotic 
capacity of this MIMO channel. 

Cl (/3) = ^ max { a \-a 2 R 2 (0)\ (58) 

2 0<a<4} 

= n 2 n R -c((3). (59) 

The channel capacity grows linearly with the number of receive antennas (n R ) - this is 
like the case of sum peak power constraints and for similar reasons. The channel capacity 
grows quadratically with the number of transmit antennas (n T ). Increasing the number 
of transmit antennas reduces the pressure from the peak power constraint because the 
peak constraint is applied on individual antennas. This causes a gain in capacity. The 
quadratic dependence stems from the fact that, at vanishingly low peak and average 
power constraints, the capacity grows quadratically with the power constraints. A similar 
observation is made in [18] for the case that the peak constraint is held fixed as the SNR 
goes to zero, and a noncoherent block-fading MIMO channel. 

• Sum and Individual Power Constraints: We compare the asymptotic capacities of a 
transmit separable MIMO channel under sum and individual power constraints. The former 
is given by (Corollary 2.2) 

2 "R 

Cs (/3) = ^£ max Y fa\ r - a 2 R 2 r {0)\ . 

2 °^7^i 

For the case under individual power constraints, we note that the actually allowed (peak 
or average) transmit power is n T times that in the case under sum power constraints. 
Therefore, we are interested in the value (Proposition 2.6) 



2 

ave 



max 



: J2i aX r -a 2 R 2 r (0)} , (60) 



rii 'l o< a <i, 

1 — —p r=l 

where a ave is the average of (an, . . . , a nj ). Thus, the asymptotic capacity under sum power 
constraints is (a m ax/ttave) 2 times that under individual power constraints, and is generally 
larger than the latter. The two values are equal only when all the transmit antennas are 
equally strong, i.e., a± — • ■ • — a„ T . 

D. SISO channels with delay spread 

1) The Model: A SISO channel with delay spread is described as follows. Its time-A: 
complex-valued channel output Y k e C is given by 

T-l 

Y k = ^pY,H ( i ) z k - t + W k) (61) 
t=o 
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where ^fpz k is the time-/c complex- valued channel input; {Wk} models the additive noise; and 

{H®} models the fading in tap t. We again assume that {Wk} is a sequence of IID random 
variables of law Ac(0, 1). The fading processes are assumed to be independent across the T 
taps, but correlated in time within each tap, so that the T processes {H k },..., {H k '} are 
independent. The autocorrelation function of the fading in Tap t is denoted by R t (-) and it is 
assumed that it is square- summable and that it possesses a spectral density function S t (-). We 
define 

oo 



/7I 



dc<j 



t v.— i 



The following two definitions are analogous to those for MIMO channels. 
Definition 2.4: A SISO channel with delay spread is said to be nonephemeral if the T 
fading processes {H^}, . . . , {H^ k ~ 1 ^} are all nonephemeral, i.e., if 

A t >2i^ 2 (0), for alH G {0,...,T- 1}. 

Definition 2.5: A SISO channel with delay spread is said to be delay separable if there are 
nonnegative constants a , . . . , «r-i and an autocorrelation function R(-) with corresponding 
spectral density function S(-) such that 

R t (k) = a t R(k) 

for allt G {0, . . . , T - 1} and k G Z. 

The definition says that a SISO channel with delay spread is delay separable if the fading 
in all the taps have the same law up to some scaling constants. Note that if a channel is delay 
separable, then it is nonephemeral if, and only if, A > 2i? 2 (0). 

We assume that the input signals are subject to the same constraints as considered earlier 
for SISO channels with flat fading, i.e., that constraints (4) and (5) are imposed. The capacity 
of this channel is denoted as C DS (p, (3) and is given by 

C m (p, (3) = lim - sup 1(2%; Y?), (63) 

ra^oo 77, 

where the supremum is taken over all distributions on that satisfy (4) and (5). We define 

cds(/5) = lim o ( 64) 

Pio p 2 

when the limit exists. 

2) Results on SISO with Delay Spread: We identify the asymptotic capacity of SISO 
channels with delay spread that are delay separable. 

Proposition 2.7 (Asymptotic Capacity — Delay Separable): If the SISO channel with delay 
spread (61) is delay separable, then the limit in (64) exists and is given by 

2 




max {a\ - a 2 R 2 (0)} , (65) 



0<a<-g 



where 



oo 



A^ Y \R(u)\ 2 (66) 



oo 

2 da- 
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Corollary 2.5 (Asymptotic Capacity — Delay Separable, Nonephemeral, No Average Power Constraint): 
If the channel is delay separable and nonephemeral, and if no average-power constraint is 
imposed (f3 = 1), then 



A - R 2 (0) 



(67) 



(T-l 
t=Q 

3) Discussion: 

• Input Distributions: As the proof of Proposition 2.7 shows, a signaling scheme that 
achieves the capacity asymptotically is to send FSK signals with a certain probability, and 
to send the all-zero signal otherwise. Here the FSK signals can be described as follows. 
The time- A; input is 

Zk = exp(i • kQ) 

where i = y^l, is a random variable, uniformly distributed over the set{^ : j £ {0, . . 
for some integer m > 1. Note that in contrast to the SISO flat fading channels, for SISO 
channels with delay spread it is in general not optimal to replace FSK with PSK. 

• Relation with MIMO channels: An upper bound on C DS (p, P) is the capacity of the 
following multiple-input single-output (MISO) channel with individual power constraints: 

T-l 

+W t . (68) 



t=o 



Here {Wk} and {H®} are the same as in the SISO channel with delay spread we are 

considering, and the input signals satisfy \zjp\ < 1 with probability one andE \\Z^\ 2 } < 

■| for all t. Indeed, with the following additional conditions, the channel (68) is the same 
as (61): 



whenever k — t — k' — t'. 



(69) 



Generally speaking, condition (69) is very strong on MISO channels. Therefore, it is 
usually not optimal to upper bound Cds(p, P) by the capacity of the MISO channel with 
individual power constraints. However, as the proof of Proposition 2.7 shows, this upper 
bound is tight in the low SNR limit for delay separable channels. 

Delay Spread does not Waste Energy at Low SNR: In a delay separable channel, the 
actually received peak (average) signal power is (^2t=Q ot^J times the received peak 
(average) signal power in the corresponding SISO flat fading channel. Proposition 2.7 
tells us that at low SNR, the asymptotic capacity of the delay separable channel is the 
same as that of a SISO flat fading channel with the same received power. Thus, having the 
power distributed in different taps does not reduce the channel capacity at low SNR. The 
delay spread channel is similar to a Gaussian channel with noise power which depends on 
the weighted sum of the past channel input powers. An analogous result for this heating 
up channel was observed in [27]. 



, m 



i}} 



III. SISO CHANNELS 

In this section we shall prove the results given in II-A. We start with the upper bound. 
Proof of Proposition 2.2: To prove that C(p,f3) < U(p,f3), where U(p,(3) is defined in 
(12), it suffices to show that 

-I{Zl-Y?)<U(p,P) (70) 
n 
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for all n G N and all distributions on Z™ satisfying the peak- and average-power constraints 
(4) and (5). To this end, we use the chain rule of mutual information to write 



k=i 

n 

= ^{/(ZI 1 ,^- 1 ;^)-/^;^- 1 )} 
k=i 

n 

< ^/(Zr,^- 1 ;^) 
k=i 

n 

= ^/(Z*!?- 1 ;^) (71) 
fc=i 

where the last equality follows because the channel has no feedback. To prove (70), it thus 
suffices to show that 

I{ZlY^-Y k )<U{p,P), keZ. (72) 
By shifting the indices by —A; and adding random variables we have 

J(Z*, y*" 1 ; Y k ) < /(Z^, y_-^; y ). (73) 
It thus follows by (73) that to prove (72) (and hence (70)), it suffices to prove 

IiZ^YZ^-Y,) <U{p^) (74) 
for all distributions on Z^ satisfying the constraints (4) and (5). To that end, we write 

/(z ^, y-^; y ) = h(Y ) - h (y q \zz^ y^) (75) 

and bound the two terms on the RHS separately. As to the first term we note that the variance 
of y is given by 

E[|y | 2 ] = E[\^pH z + w \ 2 

= e[\w \ 2 ] + e[\^h z \ 2 

= l+pE[|Z | 2 ], 
so that the differential entropy of Y is bounded by 

h(Y ) <lo g 7re(l+pE[|Zo| 2 ]). (76) 

To study the term hiY^Z^^.YZ^), we note that when z is known, the past channel inputs 
and outputs provide information about Y only through the prediction of H . So conditional 
on and yZ]^, Y has the form 

Y = ^(h + H ^z + W , (77) 

where 

h = E[H \zZ 1 oo ,yZ 1 oo ] (78) 

is the conditional expectation of H conditional on {zZ^yZlo), and where 

H = H - h (79) 

is the error in predicting H based on [zZ^yZ]^). Conditional on zZ\o, since YZ~^ and H 
are jointly PCN, we have that H is zero-mean PCN with variance that does not depend on 
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Y_^. We thus conclude that conditional on (z Q _ (Xi ,y_] x ), Y is PCN with mean ^fph^ and 



variance ( pE 



and therefore 



\Ho\ 2 U: 



] M 2 + i). 



We next show that for all z_ 



\H< 



o z - 



h(Y \Z[ 



Y' L ' 

oo ' — oo, 



>E[log7re(l + p|Z |V(p))]. 



(80) 



(81) 



Inequality (80) is justified by noting that the prediction error (i.e., the variance of Hq) is 



Z -2 



1. 



minimized when all the past inputs have maximum amplitude, i.e., \z-i\ 
In this case, the estimation of H based on (Z ^, F ^) reduces to estimation based on 
{. . . , H_ 2 + AL 2 , #_i + ALi} where N k = so {N k } are IID random variables of law 
Ac(0, 1/p) [22]. The error of the latter estimation is given by & 2 (p). Combining (76) and (81) 
we obtain 



/(^yr^^^ioga + pEDzoi 2 ]) 

-E[log(l+pa 2 (p)|Z | 2 )]. 



(82) 



We next continue with the proof of (74) by further upper bounding the RHS of (82). Let 
a = E[|Z | 2 ]. By the average-power constraint (5), 

(83) 



< a < 







By the concavity of the log function, the RHS of (82) is maximized over all distributions 
satisfying the peak constraint (4) and the constraint E[|Z | 2 ] = a by 

1 with probability a 
with probability 1 — a 



Consequently, for some a e 



0^ 



IiZ^Y-^Yo) < log(l + pa) -alog(l + pa 2 (p)) 
= log(l + pa) -al(p). 



(84) 



Maximizing the RHS of (84) over all a e 



0^ 



yields the optimal choice a* = C,(p,f3) and 



the maximum value U(p,/3), and thus establishes (74). 

Proof of Proposition 2.1: The proof consists of two parts. The first part shows that 



lim 

Pl0 



p2 ifA ^| 



(85) 



This combines with Proposition 2.2 to prove that c(f3) is upper bounded by the RHS of (8). 
The second part demonstrates that c((3) is also lower bounded by the RHS of (8). 
We begin with the first part. By (22) we have that 



lim 

Pl0 



1 



I{p) 



A 
2" 



(86) 



We study two different cases corresponding to A < | and to A > |. For the case A < |, we 
have by (13) and (86) 



limC(p,/3) 
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Thus, in this case, 

Um E%« = lim Mi±M^M 

Pio p 2 pio p 2 

A ( V , / 2x 

* '*<§)■ 

where the second equality follows by a second order Taylor expansion of the log function and 
(22). In the case where A > |, we have by (13) and (86) that 

1 



!imC(p,/3) - , 

pJ.o fj 



so that 



lim — -^J— = lim 

Pio p 2 pio p 2 



log (l + ±p) - ±/(p) 



Pio p 2 ^ ypr 2 [3 2 

P{P- — +°(P) 

X 1 r X>l). (88) 



2(3 2(3 2 ' \ P 

The limits (87) and (88) establish (85). 

We next turn to the second part. We shall now choose a joint distribution on Z™ for every 
n£N and show that under this distribution 

lim lim w , 1 ; = - max \a\ -a 2 }, (89) 

n^oo p|0 p 2 2 0<a<l 1 J 

where the RHS of (89) is equal to the RHS of (7) (or (8)). The expression on the LHS of 
(89) indeed forms a lower bound on c((3) because, by Lemma A.l, for any n G N and any 
distribution on Z" satisfying the peak- and average-power constraints (4) and (5), 

-I(Z?;Y?)<C(p,l3), (90) 
n 

and therefore, for any n G Z 

liminf M^) <liminf cte« 

p|o p 2 Pio p 2 

This inequality also holds in the limit as n — > oo. 

For a fixed n, the proposed distribution on Z" can be described as follows: 

Z k = U-$ k , ke{l,...,n}, 

where 

| 1 with probability a 
I with probability 1 — a 
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for some < a < -|, and $™ are random variables satisfying 

|$ fc | = 1, E[$ fc ] =0, fee {l,...,n} 

and 

E[$ fc $*,] =0, fc^A;'. 

Examples of distributions on have been given in Section II. Under the proposed distribu- 
tion on Z™, the mutual information i"(Z"; Y"™) when p is small can be computed by applying 
[11, Corollary 1], which is restated in this paper as Lemma B.l. We apply the lemma by 
letting Z = (Zi, . . . , Z n ) J and H be the diagonal matrix with diagonal entries H 1: . . . , H n . 
The calculation shows that 

i/(^;lT) = ^(o.i £ |i?(.-j)r-a 2 )+o( P 2 ). (91) 

V l<i,j<n J 

Noting that by (3) 

lim- Yl \ R (*-J)\ 2 = lim - E (n-M)|i2W| 2 



n^oo 72 r * n~ >oo 77, 

l<*>i<"- !/=— (n— 1) 

n— 1 



i/=-(n-l) V 



^)| 2 



= A,. (92) 



we obtain from (91) that 



k I(Z?; Y"™) 1 
lim lim 15 - = - (Aa — a J . 



n^oo p^O p 2 



(93) 



Equality (89) follows when we choose a* E 



that maximizes the RHS of (93). 



We shall now prove Proposition 2.3. Before doing so, we present a lemma which studies 
the problem of predicting the current fade H based on the past channel inputs and outputs. 
We have seen in the proof of Proposition 2.2 that if all the past inputs satisfy \z k \ = 1, 
k e {. . . , — 2, — 1}, then this problem is reduced to predicting H based on a noisy observation 
of the past H k + N k , k e {. . . , —2, —1}, where {N k } is a sequence of IID PCN noise. As 
shown in the next lemma, this problem becomes more difficult when \z k \ is not always 1. 

Lemma 3.1: If the autocorrelation function R(-) of the unit-variate fading process {H k } is 
absolutely summable, and if the input symbols {Z k } satisfy the peak-power constraint (4), 
then the conditional distribution of H conditional on the past inputs ZZ^ = zZ]^ and outputs 
Y-lo = U-lo is PCN with a variance <; 2 (p, zZ]^) which does not depend on yZ]^, and <; 2 (p, zZ]^) 
satisfies 

-l 

<T 2 (p,^) = l-p \R(v)\ 2 \z»\ 2 + o(p), (94) 



where o(p) is uniform in zZ^. 

Proof: See Appendix C. ■ 
Proof of Proposition 2.3: Since we are interested in the fact that IID inputs do not generally 
achieve the channel capacity at low SNR, we shall concentrate on the proof of the upper 
bound. The achievability part can be proved by choosing an IID distribution on Z™ taking 
values in {0, ±1} and by then applying Lemma B.l. 
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As in the proof of Proposition 2.2, to show that 



i 



if A < 2 - # 



<toW<{T-\-2 if ;> 2 h ( 95 ) 

it suffices to show that for all IID distributions on ZZ^ satisfying the peak- and average-power 
constraints (4) and (5), 

pi</ P 2 "li + te? ifA>2-| 

To prove (96), we decompose /(Z ^, YZ~£,; Y Q ) as in (75) and treat the two terms on the RHS 
separately. The first term satisfies 

h(Y ) <log7re(l+pE[|Zo| 2 ]). (97) 

To study the second term, we note that given Z , the past channel inputs and outputs provide 
information about Y only through the prediction of H . Denoting 

we can express Y in the same form as (77). By Lemma 3.1, given h and z , the distri- 
bution of H is PCN of variance <; 2 (p, zzlo), thus the distribution of Y is PCN of variance 

(l + p\z \ 2 q 2 (p,zZ l oo )). So we have 

h(Y \Z ,H ) = E[\ogne (l + p\Z \ V(p, Zz'J)] . (98) 

In the following calculations, let a = E[|Z | 2 ]. Note that since {Z k } are IID, a = E[\Z k \ 2 ] for 
all k. We obtain from (75), (97) and (98) that 

H^Y^Yo) < l g7r e (l+pE[|^o| 2 ])-E[l g7re(l + p|Zo|V(p,^))] 

'A-2 , 1, 



2 



l -a 2 + 1 -E[\Z \^p 2 + o(p 2 ) 



< [^a 2 + 1 -a)p 2 + o(p 2 ), (99) 



where the equality follows by calculations using the first order Taylor Expansion of p m 
S 2 (p, 2C^) (Lemma 3.1), the second order Taylor Expansion of x t— > log(l + x), and the fact 
that {Z k } are IID; the last inequality follows because when \Z \ < 1, E[|Z | 4 ] < E[|Z | 2 ] = a. 
From (99) it follows that 

r HZZ^Y-^Yp) /A-2, 11 ..... 
hmsup < max < a + -a > . (100) 

plO p 2 0<a<i (2 2 J 

Inequality (96) (and thus (95)) follows from (100) because when A > 2 — |, the maximum of 
the RHS of (100) is ^ + ^0 and is achieved by a = ^; when A < 2 — |, the maximum is 
and is achieved by a = ■ 
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IV. MIMO CHANNELS WITH SUM POWER CONSTRAINTS 

In this section we shall prove the results on MIMO channels with sum power constraints. 
We shall first prove the upper bound (Proposition 2.5) in two special cases, namely, for MISO 
channels and for single-input multiple-output (SIMO) channels, and then combine the proofs of 
these two special cases to prove Proposition 2.5 generally for MIMO channels. The asymptotic 
capacity results will then be proved with the help of Proposition 2.5. 

We start with upper bound on the capacity of the SIMO channel. Consider the channel (34) 
with n T = 1. We drop the superscript (t) and rewrite the channel as 

Y k (r) = yfpH^Z h + Wt\ re {I,.. . ,n R }, k e Z. (101) 

Similarly, below we write I r (p) instead of h, r (p)- The sum power constraints reduce to 
constraints on the scalars {Z k }: 

\Z k \ < 1, (102) 

E[|^| 2 ] < ^ (103) 

We denote the capacity of this channel by C S imo-s(p, P)- For this SIMO channel Proposition 
2.5 reduces to 

Lemma 4.1: The capacity C S imo-s(p, P) is upper bounded by 

Csimo-s(p, P) < U sm0 -s(p, P), (104) 

where 

CWs(p, P) = max V {log(l + apR r (0)) - al r (p)} . (105) 

Proof: Analogously to (74), to prove (104) it suffices to show that 

IiZl,,, Yll; Y ) < C/ S imo-s(p, P) (106) 

for all distributions on Z"^ satisfying (102) and (103). To prove (106), we expand its LHS 
as 

/(Z ^, YZ^; Y ) = h{Y Q ) - hiY^Z^Yll), (107) 
and proceed to bound the two terms separately. For h(Y ) we have 

h(Y ) = h(Y^\ F ( " R) ) < HY (r) )- (108) 

r=l 

We now consider h(YQ\Z _ oo -,Yz\ o )- Because there is no dependence between the n R fad- 
ing processes {H^}, we have that, conditional on Z ^ and Yl^, the random variables 
Yq 1 ^ , . . . , y^ nR ^ are mutually independent. Therefore, 

^Z^Y^) = J2 h { Y i r) \ Y -^ Z -oo) 

r=l 
n R 

= EK y ° W|(y(r)): -^) (109) 

r=l 
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where the second equality follows because, conditional on (Y^)^ and Z ^, the signal at 

(r) 

the r-th receive antenna Y V is independent of the past outputs of other antennas. From (107), 
(108) and (109) we have 

"R 

r=l 
n R 

= E 7 (^cc.(^ (r) ):L; y o (r) )- ( n °) 

r=l 

For every r e {1, . . . ,n R }, the value of 7 (-Z^, (YW)_^ ; Y ^ can upper-bounded in 
the same way as (84), thus we have from (110) 



n R 



I{Z°_^ Yzl; Y ) < ]T {log(l + P R r (0))a - al r (p)} , 



r=l 

A 



where a = E[|X | 2 ]. Maximizing the RHS of this inequality over a yields (106). ■ 
We now turn to the MISO channel. Consider the channel model in (34) when n R = 1. 
Dropping the superscript (r) we rewrite the channel as 

m 

Y k = y/pJ2H®Z® + W k . (Ill) 

t=i 

Similarly, below we write I t (p) instead of h,i(p), and of(p) instead of of^p). Denote the 
capacity of this channel under the sum constraints (38) and (39) by C M iso-s(p, 0). Proposition 
2.5 reduces to the following lemma. 

Lemma 4.2: The capacity C M iso-s(p, P) is upper bounded by 

Cmiso-s(p,/3)<CWs(p,/3) (112) 

where 

Umso-s(p,P) — max < log 1 + p V] R t (0)a t 

-J2a t I t (p)\. (113) 
fc=i J 

Proof: In analogy to the SISO case, to prove (112) it suffices to show that 

/(Z^y-^Yo) < £Ws(p,/3) (H4) 
for all input distributions satisfying (38) and (39). To prove (114), we expand its LHS as 

/(Z^, Y^; Y ) = h(Y ) - MY-IZ ^, Yli) (115) 

and bound the two terms on the RHS of (115) separately. As in the SISO case, h(Y ) is upper 
bounded by the differential entropy of a PCN random variable with the same variance as Y . 
The variance of Y is given by 

rax nx 

E [I Y | 2 ] = E [|W fc | 2 ] + £ E [| | 2 ] = 1 + p £ i? t (0)E 



Hence, 

rax 

(*)|2 



I 7^l 2 
l Z I 



h(Y )< log Ue l+p^i? t (0)E 



l z o 



(116) 
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We now consider the term h(Y \Z°_ 0Q , To bound its value, we consider for every k eZ 

the random variable W k given by 



m 

t=i 



l z fc|||^fc) 



(117) 



where {N^\ . . . , A^ nT ' > } and N' k are IID random variables of law Ac(0, 1), which are indepen- 
dent of the channel inputs and fading processes. It is easy to justify that, with our definition, 
{W' k } is a sequence of IID Ac(0, 1) random variables independent of the channel inputs and 
the fading processes. Thus we may replace the additive noise Wk with W k for every k G Z in 
the channel model without actually changing the channel law. When we do this, the time-A: 
output Y k can be written as 



(t) 

k 



t=l 



+ 



i 



nx 



k=i 



Conditional on Z and on (y/pH^ + Njp) for all k G {. . . , -2, -1} and t G {1, 
current output F is independent of Zl^, and Y^. Thus we have 



information about Y" only through the prediction of Hq \ Furthermore, this prediction (and, 
in particular, the prediction error) is independent between different t's. The error in predicting 

based on (^fpH k ^ + Njp ) is of(p). We thus obtain that, conditional on Z and 



Conditional on Z , for every t G {1, . . . , n T }, the values of f yfpH® + N } 



r(t) 



(118) 

,n T }, the 

(119) 
provide 



(y/pHk + Nfc)^^, the random variable Y is PCN with variance 1 + pJ2t=i \ Z o ? a 1 (p)- 
Consequently, the conditional differential entropy h (lo|Z , (yfpHk + ^k)k=-oo) I s 

hfrolZo^v^Hfc + Nfc^-oo) = 



nx 



logTrejl+p^l^lV^p) 



(120) 



From (115), (116), (119) and (120) it follows that 



nx 



K«)|2 



/(z^, r-^; y ) < log i + 

\ t=i 

( "T 

log i+pEi^i 2 *) 



t=l 



(121) 



We shall now maximize the RHS of the inequality over the distribution on Z . Let a t = 
for all t G {1, . . . , nx}- Due to the concavity of the log function, the expectation of 



'Ml 2 



the log on the RHS of (121) is minimized when for all t G {1, . . . , n T }, with probability a t , 

1, 

0, t' ^ i, 



7 (t) 
7 (f) 
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and with probability (1 — Y2t=i a *)' = 0. This minimum value of the expectation of the 
log is Ylk=i a t^t(p)- Thus we have from (121) 



K^YZ^Yo) <\og [l+pJ2M0)at -$> t J t (p). (122) 

V t=\ ) k=l 

To prove (114), it remains to maximize the RHS of the above inequality over a. Note that due 
to the peak- and average-power constraints (38) and (39), a e A(f3) where A(f3) is defined in 
(43). Thus the maximum value of the RHS of (122) is £/ M iso-s(p, P)- B 
We now turn to prove the upper bound on the capacity of the MIMO channel with sum 
power constraints. 

Proof of Proposition 2.5: As in the SISO case, to show C s (p, (3) < Us(p,(3), it suffices to 
prove 

/(Z 00 ,Y:^;Yo)<[/ s (p,/9) (123) 

for all distributions on Z ^ satisfying (38) and (39). As in (110) for the SIMO channel, the 
LHS of (123) is upper bounded by 

n R 

/(Z^YZ^Yo) < ^/(Z^^rM):^;^). (124) 

r=l 

a t for all t, each summand on the RHS of (124) is upper 



I 2 
l Z I 



By (122), when fixing E 
bounded by 

(m \ n T 

l + pJ2RrA0)at)-J2 a ^Ap)- (125) 
t=l / k=l 

Thus we obtain from (124) and (125) that 

/(Z ^, YZ^; Y ) < 1 10 § 1 + PE^(°) fli - $>/ r ,(p) ■ d26) 

r=l { \ t=l J k=l J 

Note that for input distributions satisfying (38) and (39) we have a e A(f3). Thus maximizing 
of the RHS of (126) on a yields the value U s (p, (3). This establishes (123). ■ 
With the upper bound established, we now proceed to prove the results on the capacity 
asymptote. 

Proof of Proposition 2.4: The proof consists of two parts. The first part shows that 

Pio p 2 

= 2 s, E j E a ^ - ( E ^.'(°)°* J | • (127) 

It then follows from Proposition 2.5 and (127) that 

C ( ft\ 1 ™ R f ™ T / nT \ 2 

limsup S[P ; P> < - max ^ j ^ a t A,,t - ^ ^,t(0)a t J I . (128) 



The second part of the proof shows that the RHS of (42) (which is the same as the RHS of 

~s(fi: " 
IP 



(127)) also forms a lower bound on liminf p ^ Cs ^\ 
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To prove (127), we use the second order Taylor Expansion of the log function and the 
second order Taylor Expansion of the function I r ,t(') (21) to obtain 



m \ nj 



t=l / k=l 

2 N 



2 / n T / 



= y \J2 a ^ - [Y. RrA^tj J + o( P 2 ) (129) 

where the term o(p 2 ) is uniform in a. Now (127) follows by (129) and (49). 

We now start the second part of the proof. To derive a lower bound on the asymptotic 
capacity, we shall find a distribution on Z" for every nGN, such that 

-I(Z n -Y n ) 1 ri R ( n T / m \ 2 "j 

Jim Jim n \ 12 U = max J2 { V a t \ r ,t ~ V RrM<h I . (130) 



Note that by Lemma A.l, the LHS of (130) forms a lower bound on liminf p | Cs ^\ This 
combined with (128) proves (127). 

For every n G N and every vector a G A(f3), consider the input distribution 

Zf = U® ■ $ fe , 

where the random variables {U^\ . . . , U^} are chosen such that with probability a t , 

U® = 1, 

c/(*') = o, n± t, 

and with probability (1 — J2t=i a *)' 

U = 0. 

The random variables are chosen in the same way as for the SISO channels, i.e., they 

satisfy 

|$ fc | = l, fce{l,...,n} 

and 

E[$ fc $*,]=0, fc^A;'. 

It can be checked that this input distribution satisfies the sum power constraints (38) and (39). 
To compute 7(Z"; Y") for this distribution, we again use Lemma B.l. Calculation shows that 

2 n R ( n T 1 / 

r=l L t=l \l<ij<n 

- (jTa t R r , t (0)) \+o(p 2 ). 



Similarly as (92), we have that by (21) 

lim-| \ !/?,,(/ A 



r,t- 
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Thus, for every a G A{(3), under the input distributions chosen according to a, 

i/(Z?;Y?) 



lim lim ■ 

71-+00 plO p 2 



= 2^ \J2 a ^- (E a ^(°)J [ 



(131) 



Choosing a to be the vector that achieves the maximum in (130) completes the proof. ■ 
We shall now derive Corollary 2.2 from Proposition 2.4. 

Proof of Corollary 2.2: If the channel is transmit separable, we have X r>t = cuf A r and 
-Rr,t(0) = a t R r (0). Equation (42) reduces to 

n R r n T 
r=l L t=l 

-R 2 r (0) (j^atat) 1. (132) 



,t=i 

Assume without loss of generality that cei > a t for all t e {2, . . . ,n T }, i.e., that the first 
transmit antenna is the strongest. We shall next show that, under this assumption, it is optimal 
to concentrate all the transmit power on the first antenna. To be more precise, we shall show 
that for any a, if a' is given by 

t=l a t a t f 1 

(133) 

otherwise, 

then 

"R f 1 T / n T \ 2 j "R [ "T / «T 

r=l ^ t=l \t=l J J r=l [ t=l \t=l 

'(134) 

Note that a' e A((3) whenever a e ^4(/5). Inequality (134) follows because, according to 
(133), 

7lx «T 



^a t a: t = ^ a' t a t , 
t=i t=i 

rax nx 



t=l t=l 

Thus, we conclude that the maximization over A((3) in (132) can be reduced to a maximization 

over the set ja : < a\ < ^; a t — 0, t — 2, . . . , n T |. This establishes (44). ■ 

Proof of Corollary 2.3: In (44), when (3 = 1 and A r > 2_R 2 (0), the optimal choice of a is 
a* = 1. ■ 



V. MIMO CHANNELS WITH INDIVIDUAL POWER CONSTRAINTS 

In this section we shall prove some capacity bounds for MIMO channels with individual 
power constraints, and then use these bounds to prove the main results given in Section II-C 
about such channels. 
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We shall first give an upper bound on the capacity that is valid for any SNR. To this end, 
we introduce a few definitions. Let V be the set of all probability distributions on {1, . . . , n T }: 



D = < d = (di,..., d nj ) J : d t > 0, t e {1, . . . , n T }; and ^d t <lL (135) 



Let B be the set of all length-n T binary sequences: 

B = {0,l} nT . (136) 
Further, for any (3, let V(f3) be a set of probability distributions on B defined as 

W) = |p : YjPth < 1 t E {1, . . . ,n T } j . (137) 

Proposition 5.1: For any p > 0, (3 > 1 and any d E V, 

C 1 (p,P)<U 1 (p,p,d), (138) 

where 



™R [ / /TIT 



/3, d) 4 max ^ log l + ^p b 6^(0) 

pe (/?) r =l I V beB \t=l , 

+ X> ( i + p E (£))}' (139) 



beB 

r 2 



where cr^ t (oo) is taken to be 0. 

The proof of this upper bound is a combination of the proofs for MISO and SIMO channels. 
We shall give a proof of this bound for MISO channels. The bound for SIMO channels and 
for general MIMO channels can be proved in exactly the same way as in the sum power 
constraint case, therefore we omit these two parts. 

For MISO channels (n R = 1), the above proposition reduces to the following lemma. 
Lemma 5.1: For any p > 0, (3 > 1 and any d e V, 

C M iso-i(p, P) < U M1S0 -i(p, (3, d), (140) 

where 

Umso-i(p, P, d) = max \ log 1 + p Y]p b I Y] b t Rt(0) 

+ E.io g ( 1+ p|^(|))}. 

Proof: In analogy to the SISO case, to prove (138), it suffices to show 

/(Z^, YI^ Yo) < £Wi(p, d) (142) 

for all input distributions satisfying the individual power constraints (52) and (53), and for 
all d G V. To this end, we follow the proof of Lemma 4.2, but, instead of W k as defined in 
(117), we introduce 



nj 



t=i 



riT 
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for every d e V to replace the additive noise W^. Here {-/V^, . . . , N^} and are defined 
in the same way as for (117), i.e., they are IID random variables of law Ac(0, 1) and are 
independent of the other channel variables. Instead of (118), we write Y k as 



y* = E + Vw?) + , i - E 

t=l \ t=l 



Following the steps in the proof of Lemma 4.2 we have, instead of (121), 



/(z° yri; y ) < log 1 + £ ^(o)e 



'M|2 



- E 



log fl + pgl^M^) 



(143) 



By the concavity of the log function, to maximize the RHS of (143) over distributions on 
Z , it suffices to consider the case when each input signal has either magnitude zero or one, 

i.e., it suffices to consider the case when the vector (\Zq^\ 2 , . . . ^Z^] 2 ^ takes value in 

B. Let p be the probability distribution of (jZ^] 2 , |^" t) | 2 ) E B. Note that, according 
to the individual average-power constraint (53), p must satisfy p e V(/3). We thus obtain 
that maximizing the RHS of (143) yields £/ M iso-i(p, P, d) as defined in (141). This establishes 
(142). ■ 

The next corollary, which follows from Proposition 5.1, gives an upper bound on the 
asymptotic capacity of MIMO channels with individual power constraints. 

Corollary 5.1: For any (3 > I and d e V satisfying d t > 0, t e {1, . . . , n T }, 

,. Ci(P^) , if 
hm sup ^ — < max - 



Pi0 



P GP(/3) 2 



r=l 



beB V t=l * \t=i 



nx 



,t=l 




(144) 



Proof: Inequality (144) follows from (138) and (139) by application of the second order 
Taylor Expansion of the function x h- > log(l + x) and the first order Taylor Expansion of 
<J 2 t (-). The latter is given by 



°r,t\P) = R r,m o P + 



(145) 



which can be obtained from (19). ■ 
Proof of Proposition 2.6: We first consider the upper bound (144) for transmit separable 
channels. For such channels, we have R r ,t(0) = a t R r (0) and A r ,t = a 2 \ r . Choosing d to be 

a t 



and denoting 



dt = ;™ » t G {1, . . . ,n T }, 



a(p) 



(146) 
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r cm) 1/^ Vv^ 



t=l / r=l 



(A r -i? r 2 (0))a(p)- J R r 2 (0)a 2 (p) 

< fc^ = 1, (148) 

the square of the LHS of (148) is less than or equal to itself. Thus from (146) and (148) we 
have 



Note that since 



From (147) and (149) we obtain 



EMxf^f) ^ p )- (149) 



lim S up^M<j(Va ( ) max V {a(p)A r - a 2 (p)i? r 2 (0)} . (150) 

Noting that the RHS of (150) depends on p only through a(p), we replace a(p) by a. Then 
(150) reduces to 

limsup ^%^ < - (ya}\ max Y {a\ r - a 2 R 2 r (0)} . (151) 
Pio P 2 \tt J T^i 

The RHS of (151) is the same as that of (56). 

To derive a lower bound on the asymptotic capacity, we propose an input distribution on 
Z™ for every n e N. Such distributions are given by 

Z£> = • • • = Z£*> = • fcG{l,...,n}, 

where [/ and $ fe are chosen in the same way as for SISO channels, as described in Section 
III. We apply Lemma B.l to obtain 



"'I \ - ™R 



lim lim " 1 1 ' 1 = - 1 Yat 1 max V {a\ r - a 2 R 2 (0)} . (152) 
By Lemma A.l, this forms a lower bound on 

Combining (151) and (152) establishes (56). ■ 
Proof of Corollary 2.4: Note that when (3=1 and the channel is nonephemeral, i.e., 

A r > 2i? 2 (0) for all r e {1, . . . , n R }, the choice of a that maximizes the RHS of (56) is 

a — 1. Thus, in this case, (56) reduces to (57). ■ 
For channels that are nonephemeral, with no average-power constraint (f3 = 1), but that 

are not necessarily transmit separable, we have the following upper and lower bounds that in 

general do not coincide. 
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Corollary 5.2: If the MIMO channel is nonephemeral and if no average-power constraint 
is imposed (f3 = 1), we have an upper bound on the capacity asymptote given by 



lim sup 



n R m 



Pl0 



r=l t=l 



A 



r,t 



KM 



(153) 



and a lower bound given by 



n,R oo 



liminf^UVV 



nj 



t=l 



(154) 



Proof: The upper bound (153) is obtained by choosing d 



1,..., 1) J in (56). The 

lower bound (154) is obtained by using the input distributions given in the proof of Proposition 
2.6, with U = 1 with probability one. ■ 



VI. SISO CHANNELS WITH DELAY SPREAD 
In this section we shall prove Proposition 2.7 and Corollary 2.5. 

Proof of Proposition 2.7: As shown in Section II-D, the capacity of the SISO channel with 
delay spread (61) is upper bounded by the capacity of the MISO channel (68) with the same 
individual peak- and average-power constraints. The latter is obtained by choosing n R = 1 in 
(56), which yields the same value as the RHS of (65). Thus, to prove (65), it only remains to 
find a lower bound on the asymptotic capacity that coincides with its RHS. 

For every nGN, consider the following distribution on the input signals Z{ 1 . Let 

Z k = U-<f> k , k e {l,...,n}, 

where U is equal to 1 with probability a and is equal to with probability (1 — a); 
is chosen such that $ fe = exp(i • /c6), where i = y/^T; @ is uniformly distributed over the 
set : j £ {0, . . . , m — 1}} for some m > 1. The asymptotic value of I(Z™; Y™) for this 
input distribution is calculated using Lemma B.l to yield 

-I(Z n ~ Y n ) 1 ( T ~ x \ 2 
lim ii m n_!_J_i 1 1 = _ V" a \ m ax \a\ - a 2 R 2 (0)\ . 

By Lemma A.l, this gives us the desired lower bound on the asymptotic capacity. ■ 
Proof of Corollary 2.5: When the channel is nonephemeral (A > 2i? 2 (0)) and (3 = 1, the 
choice of a that maximizes the RHS of (65) is a = 1. ■ 



Appendix A 
A Lower Bound on the Capacity 

In this section we present a general lower bound on the capacity of the SISO fading channel 
considered in this paper. This lemma can be extended to MIMO channels with sum power 
constraints or individual power constraints, and to SISO channels with delay spread. The proofs 
for these three cases are exactly the same as for SISO channels. 

Lemma A.l (Lower Bound on Capacity): For any n E N and any distribution on satis- 
fying the peak- and average-power constraints (4) and (5), 

C(j>,j3)>-I(Z?;Y?). (155) 
n 

Proof: We extend the distribution on to a distribution on {Z k } in such way that the 
length-n blocks of input symbols {Z k \^ n } k x L are IID according to the law of Z[ l . Clearly, 
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if the given distribution on satisfies Constraints (4) and (5), then so does the induced 
distribution on {Z k }. We next show that under this distribution on {Z k }, 

lim ij(Z?; Y») > Y?), (156) 

n^oo iv n 

from which the Lemma A.l follows. To prove (156), we let m = [N/n\ and write 

m—l 



~ N 2^ 2 [ Z kn+1 > y i \ Z 1 ) 
k=0 

m—l 

_ 1 V 7 r f v( k + 1 ) m . VN ykn\ 
~ N l^ 1 \ Z kn+l , ^1 > Z l J 
fc=0 
1 m—l 

■> J_ r f 7( fc+1 ) n . v( fe+1 M 



fc=0 
77? 

= (157) 

where the first inequality follows by omitting terms in the mutual information; the next equality 
by the chain rule; the next equality because the input symbols in different blocks are mutually 
independent; the next inequality again by omitting terms in the mutual information; and the 
last equality because every block of input symbols has the same distribution as the first block 
(k = 0). Inequality (156) follows from (157) because 

.im m = -. 

N-^oo N n 



Appendix B 

Second-order asymptotics of mutual information 
In this section we restate a special case of [11, Corollary 1]. Consider the following channel 

Y = v /7jHZ + W (158) 

where Y, Z and W are random n-vectors and HI is an n x n random matrix. The entries of 
EI can be correlated with each other, and are assumed to be of mean zero and jointly PCN. 
The coordinates of the additive noise vector W are IID random variables of law A/"c(0, 1). 
Lemma B.l: If there exist 5 > and v > such that 

Pr[||Z|| 2 > 5} <exp{-8 v } (159) 

for all 5 > 5 , then 

2 

J(Z;Y) = ^-trace{E[(E [UZZW\Z]) 2 

-(E[HE[ZZt] tf]) 2 )+o(p 2 ). 



Note that since in this paper we are considering channels with hard peak-power constraints 
((4) or (38) or (52)), Condition (159) is always satisfied. 
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Appendix C 
Proof of Lemma 3.1 

In this section we shall prove Lemma 3.1. Define Q by Q = Yl™=-oo l-^MI- According to 
the assumptions of Lemma 3.1, we have that Q is finite, that R(0) = 1, and that the past inputs 
zllo satisfy the peak power constraint (4). Let K denote the infinite matrix, with rows and 
columns indexed by the negative integers, with row-p column-z/ entry R(/j, — v) for negative 
integers p and v. Let D denote the infinite diagonal matrix with row-/i column-/! entry z M , 
for negative integers p. Let v be the infinite column vector with p-th entry -R(p) for negative 
integers p. For zZ 1 ^ fixed, I + pDKD^ is the covariance matrix of the observation YZ~^, and 
v^D" 1 " is the covariance between for the variable to be estimated, H , and the observation 
YZ~tt- Although K is an infinite matrix, its powers K J are well defined in terms of absolutely 
convergent sums. Indeed, it is easy to show by induction on j that max^ KK- 7 )^! < Q 3 ~ x 
for any j > 1. In view of this fact, for sufficiently small p, (I + pDKD*) -1 is well defined 
by an absolutely convergent series: 

oo 

(I + pDKDV^I + ^l-pDKDty. (160) 

3=1 

The orthogonality principle can be used to check that the optimal estimator can be represented 

by 

i/o = v t Dt(I + pDKDt)- 1 r^, 

with the minimum mean square error given by 

S 2 (P, zll) = 1 - pvW(I + pDKD^'Dv- (161) 
Substituting (160) into (161) yields: 



where 



A = -pvW ^2(-pD^KDyj Dv. 

Let |K| be the matrix obtained by replacing each entry of K by its magnitude, and define |v| 
and |D| similarly. Note that |D| < I, and the sum of the entries of v is less than or equal to 
Q. Therefore, for < p < l/Q, 

|A| < p|v| T |D| ^(p|D||K||D|)^ |D||v| 

oo 

< 52p\v\ T (p\K\Y\v\ 

3=1 



oo 



Lemma 3.1 is proved. 
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